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Abstract 

We  present  an  algorithm  for  determining  the  connectivity  of  a  set  of  A  rectangles 
in  the  plane,  a  problem  central  to  avoiding  aliasing  in  VLSI  design  rule  checkers. 
Previous  algorithms  for  this  problem  either  worked  slowly  with  a  small  amount  of 
primary  memory  space,  or  worked  quickly  but  used  more  space.  Our  algorithm  uses 
O(IV)  primary  memory  space,  where  W,  the  scan  width,  is  the  maximum  number 
of  rectangles  to  cross  any  vertical  cut.  The  algorithm  runs  in  0{N  IgA)  time  and 
requires  no  more  than  O(N)  transfers  between  primary  and  secondary  memory. 

Keywords:  computational  geometry,  design  rule  checking,  VLSI,  algorithms,  rectan¬ 
gles,  connected  components,  scanning. 


1  Introduction 

For  a  VLSI  design  to  be  reliably  produced  as  a  working  chip,  various  features  on  the  chip 
must  be  separated  by  minimum  distances  to  ensure  the  proper  operation  of  transistors 
and  interconnections.  The  design  rule  checker  program  verifies  that  these  and  other 
geometric  constraints  are  satisfied  and  signals  an  error  if  it  finds  wo  features  that  vioicte 
the  design  rules.  For  a  chip  composed  of  millions  of  rectangles,  design  rule  checking  is 
a  time-consuming  process  which  cannot  be  done  entirely  within  the  primary  memory  of 
many  computers. 

This  research  was  supported  in  part  by  the  Defense  Advanced  Research  Projects  Agen:y  under  Con¬ 
tract  N000  U  80  C  Ofi22 
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Figure  1:  A  set  of  rectangles  with  connected  components  {A,  B,  D,  E,G),  {C,  F},  and 
{H}.  On  the  the  right  is  shown  a  scan  set  at  the  time  rectangle  E  enters.  Only  active 
rectangles  (those  crossed  by  the  scanline)  have  an  interval  in  the  scan  set.  The  interval 
for  E  will  be  entered  in  the  scan  set  5  after  all  processing  for  its  enter  event  is  complete. 

This  paper  presents  an  efficient  algorithm  for  finding  the  connected  components  of 
rectangles  in  the  plane  using  a  machine  model  that  incorporates  the  secondary  disk 
memory  where  the  VLSI  design  is  stored.  By  running  this  algorithm  simultaneously  on 
each  layer  of  a  VLSI  chip  design,  a  design  rule  checker  can  determine  which  features  of 
a  chip  design  are  electrically  equivalent,  t.e.,  are  effectively  part  of  the  same  wire.  The 
determination  of  electrical  equivalence  allows  the  design  rule  checker  to  avoid  reporting 
the  many  aliasing  errors  that  occur  when  two  electrically  equivalent  features  are  mistaken 
for  electrically  distinct  features.  For  example,  two  wires  might  be  too  close  together,  but 
if  they  are  actually  the  same  wire,  it  does  not  matter. 

Many  VLSI  design  systems  use  rectilinearly  oriented  rectangles  to  represent  the  design 
features.  Two  rectangles  are  electrically  equivalent  if  they  are  connected  by  a  path  of 
intersecting  rectangles.  The  connected  components  problem  is  to  label  each  rectangle 
in  a  design  such  that  two  rectangles  have  the  same  label  if  and  only  if  they  are  in  the 
same  connected  component.  The  set  of  rectangles  in  Figure  1,  for  instance,  has  three 
connected  components:  {A,  B,  D,  E,  G),  {C\F},  and  {/f}. 


Figure  2:  The  computer  model  includes  a  secondary  disk  memory  as  well  as  primary 
memory.  The  connected  components  algorithm,  which  assumes  the  rectangle  database 
is  on  disk,  uses  O(N)  references  to  sequential  files,  0(W)  primary  memory  space,  and 
O(iVlg.V)  CPU  time. 

The  connected  components  of  N  rectangles  in  the  plane  can  be  determined  by  an 
algorithm  due  to  Guibas  and  Saxe  [4]  in  O(jVlgjV)  time,  which  is  remar kable  in  that 
there  may  be  as  many  as  order  .V2  rectangle  intersections.1  Their  algori  hm  uses  the 
technique  of  scanning,  introduced  by  Shamos  and  Hoey  [8],  which  assumes  that  the 
vertical  edges  of  rectangles  are  initially  sorted  by  x-coordinate.  Scanning  algorithms 
work  by  sweeping  a  scanline  over  a  set  of  geometric  objects  in  the  plane  and  then  working 
primarily  with  the  objects  crossed  by  the  scanline.  In  the  Guibas-Saxe  algorithm,  the 
scanline  is  a  vertical  line  that  sweeps  from  left  to  right  over  the  rectangles.  Unfortunately, 
the  Guibas-Saxe  algorithm  is  designed  to  run  entirely  within  primary  memory,  and  it 
may  cause  disk  thrashing  for  a  large  VLSI  chip. 

In  this  paper,  we  abandon  the  simple  primary  memory  model,  and  instead  use  a 
machine  model  that  includes  a  secondary  disk  memory  as  well  as  primary  memory. 
The  configuration  is  shown  in  Figure  2.  We  assume  that  the  primary  memory  is  a 
fast,  random-access  memory  of  limited  size.  The  set  of  rectangles  is  kept  in  a  file  in 
secondary  disk  storage.  Accesses  to  the  file  are  presumed  to  be  sequential,  either  forward 
or  backward.  More  general  random  accesses  to  disk  blocks  are  unnecessary  for  our 
algorithm. 

This  model  is  used  by  Szymanski  and  Van  Wyk  [9]  for  a  connected  components 
algorithm  which  is  more  suitable  for  large  rectangle  databases  than  the  Guibas-Saxe 
algorithm.  The  Szymanski-Van  Wyk  algorithm  uses  less  primary  memory  than  the 
Guibas-Saxe  algorithm  and  has  locality  of  reference  for  secondary  memory.  The  amount 
of  primary  memory  space  used  by  the  algorithm  is  O(W),  where  W .  the  scan  width,  is 
the  largest  number  of  rectangles  cut  by  any  scanline.  In  practice,  Szymanski  and  Van 
Wyk  comment,  the  size  of  W  is  about  0{\/N).  Unfortunately,  their  algorithm  is  based 
on  rectargle  intersections,  and  the  running  time  can  be  as  large  as  @(.VH  ) 

This  paner  presents  a  connected  components  algorithm  that  combines  ai.d  optimizes 

1  |n, hi  ftn.l  A  nano  (r»)  aU.  Iu*v«*  an  ( )<  V  V  )  .  onn<',  !r«|  romponmta  algorithm  for  the  pr  mar>  inomory 
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the  Szym&nski  and  Van  Wyk  and  the  Guibns  and  Saxe  algorithms.  It  uses  (){ IF)  (pri¬ 
mary  memory)  space  and  runs  in  0(N  Ig  N)  time  in  the  worst  case. 

The  algorithm  consists  of  a  two-pass  scan  over  the  set  of  rectangles.  Most  of  the  work 
is  done  in  the  first,  forward  scan.  A  backward  scan  is  then  used  to  produce  the  labeling 
of  rectangles  such  that  two  rectangles  have  the  same  label  if  and  only  if  they  are  in  the 
same  connected  component.  The  algorithm  maintains  four  data  structures  of  size  0(W ) 
during  its  forward  scan. 

The  remainder  of  this  paper  presents  the  connected  components  algorithm  and  its 
analysis.  Sections  2.1,  2.2,  2.3,  and  2.4  describe  the  four  data  structures  used  during  the 
forward  scan.  Section  3  gives  the  algorithm,  section  4  proves  its  correctness,  and  section  5 
analyzes  its  time  and  space  requirements.  Finally  Section  6  offers  some  concluding 
remarks. 

2  Data  Structures 

In  scanning  algorithms  an  event  is  a  geometric  phenomenon  that  causes  some  computa¬ 
tion  at  the  time  when  it  occurs.  There  are  two  types  of  events  for  a  left-to-right  scan: 
a  start  event  when  the  scanline  crosses  the  left  boundary  of  a  rectangle  (the  rectangle 
becomes  active ,  or  enters)  and  an  end  event  when  the  scanline  crosses  the  right  bound¬ 
ary  of  a  rectangle  (the  rectangle  becomes  inactive,  or  leaves).  Each  rectangle  has  an 
associated  start  event  ami  end  event.  The  four  data  structures  given  in  this  section  are 
used  by  the  connected  components  algorithm  during  scanning. 

2.1  The  rectangle  set 

The  rectangle  set  R  is  a  dynamic  set  that  contains  the  active  rectangles  at  any  point 
during  the  scan.  We  assume  that  each  rectangle  in  the  disk  file  has  a  unique  identification 
number.  When  a  rectangle  enters  primary  memory,  it  is  stored  in  the  set  R  with  the 
identification  number  as  a  key.  The  rectangle  set  can  be  maintained  as  a  balanced  search 
tree,  using  0{W)  space.  Each  insertion,  deletion,  or  search  takes  O(lgW)  =  O(lgN) 
time. 

2.2  The  scan  set 

The  data  structure  that  maintains  the  scanline  for  the  connected  components  algorithm 
is  called  the  scan  jet.  At  any  point  during  the  forward  scan,  the  active  rectangles  can 
be  represented  as  a  set  of  vertical  intervals,  i.e.,  an  interval  in  y.  For  example.  Figure  1 
shows  the  intervals  of  the  active  rectangles  at  the  time  rectangle  E  enters.  The  scan  set 
5  maintains  the  dynamic  set  of  intervals  that  represents  the  active  rectangles. 

The  scan  set  allows  the  connected  components  algorithm  to  determine  rectangle  in¬ 
tersections  easily.  Two  rectangles  intersect  if  and  only  if  there  is  a  scanline  that  crosses 


both  rectangles,  and  their  intervals  overlap  in  the  scan  set  corresponding  to  the  scan¬ 
line.  This  technique  for  determining  rectangle  intersections  is  well  known  and  is  used 
in  previous  scan-based  algorithms  for  determining  rectangle  intersections  or  connected 
components  [3,4,9]. 

To  be  precise,  a  scan  set  5  supports  the  following  operations: 

S-lNSERTl(A):  Add  rectangle  A  to  the  scan  set. 

S-DELETE!(A):  Remove  rectangle  .4  from  the  scan  set. 

S-FlND(/):  Return  a  rectangle  in  the  scan  set  5  that  overlaps  interval  I  in  some  way, 
and  NIL  if  no  rectangles  overlap  I. 

The  number  of  rectangles  stored  in  S  at  any  given  time  during  a  scan  is  at  most  the 
scan  width  W .  We  use  an  interval  tree  [7],  a  simple,  sparse  variation  on  a  balanced  tree,  to 
implement  each  of  the  three  operations  in  time  0(lg  TV)  and  space  0(W).  Alternatively 
we  can  achieve  the  same  assymptotic  space  and  time  bounds  for  the  above  operations 
by  using  McCreight’s  priority  search  trees  [6]  to  store  rectangles  keyed  on  interval. 


2.3  Component  set 

During  the  forward  scan,  the  connected  components  algorithm  maintains  a  component 
set  Q  that  reflects  our  current  knowledge  of  the  connectivity  of  the  active  rectangles. 
Each  component  is  designated  by  a  color,  which  for  convenience  is  represented  as  an 
integer.2 

The  rectangle  colorings  within  the  component  set  Q  may  change  with  a  start  event. 
If  a  new  rectangle  connects  two  previously  unconnected  components,  we  merge  them 
within  the  component  set  Q  by  recoloring  active  rectangles  in  the  smaller  of  the  two. 

The  component  set  Q  supports  the  following  operations: 

Color!(A):  Assigns  rectangle  A  a  new  (unused)  color. 

LtNCOLOr!(A):  Dissociates  rectangle  A  from  others  of  its  color.  If  .4  is  the  last  of  its 
color,  the  color  is  destroyed  (made  available  for  reuse). 

Color(.4):  Returns  A’s  color. 

REPRESENTATIVE(q):  Returns  any  rectangle  having  color  q  6  Q.  If  there  is  no  such 
rectangle,  return  NIL. 

ilECOLOR!(<7i ,  qj):  Takes  all  rectangles  of  color  q\  and  color  qi  and  makes  them  all  either 
color  q\  or  color  qt.  The  other  color  is  destroyed. 

'The  let  .er  Q  is  mnemonic  for  “qonnocted  qomponents’’  and  "qolor.  ‘  The  first  letters  of  the  alphabet 
are  reserved  for  rectangles 


We  implement  the  component  set  Q  using  a  vector  in  which  each  color  is  represented 
as  an  index  in  the  vector.  Each  slot  in  the  vector  contains  a  pointer  to  the  first  rectangle 
in  a  doubly  linked  list  of  all  rectangles  of  that  color,  and  the  number  of  rectangles  in  the 
list.  The  pointers  to  implement  the  linked  lists  can  be  stored  with  the  actual  rectangles. 
Each  rectangle  also  stores  the  index  of  its  color.  If  the  number  field  is  zero,  the  color  is 
unused,  and  we  then  use  the  pointer  field  to  implement  a  free  list  of  the  unused  colot'; 
An  extra  variable  is  needed  to  store  the  head  of  the  free  list. 

All  operations  except  RECOLOR!  can  be  implemented  in  constant  time.  If  we  always 
merge  the  color  with  the  smaller  number  of  rectangles  into  the  one  with  the  larger 
number,  then  we  can  do  0(N)  recolorings  in  0(N  lg  N)  time.  There  are  at  most  W 
rectangles  in  the  component  set  Q  at  any  given  time  so  the  data  structure  need  only  be 
size  0(W). 

2.4  Territory  set 

To  achieve  an  O(NlgN)  worst  case  running  time  for  the  connected  components  algo¬ 
rithm,  we  must  find  a  way  to  maintain  the  component  set  Q  without  looking  at  every 
intersection.  Figure  3  shows  the  basic  idea.  The  active  rectangles  B ,  C,  and  D  have  the 
same  color,  say  1.  The  new  rectangle  E  intersects  all  three  of  these  rectangles,  which 
tells  us  that  rectangle  E  should  be  given  the  same  color  as  rectangle  B,  all  rectangles 
with  B' s  color  should  be  merged  with  rectangles  of  rectangle  C s  color,  etc.  We  would 
get  the  same  result,  however,  if  we  just  noticed  that  rectangle  E  intersects  some  rectan¬ 
gle^),  all  of  color  1.  That  is,  instead  of  asking,  “What  other  rectangles  does  rectangle 
E  intersect?”  we  would  like  to  be  able  to  ask,  “Is  there  a  color  q  in  the  component  set 
Q  such  that  rectangle  E  intersects  at  least  one  rectangle  colored  qV'  We  now  describe  a 
new  data  structure  called  a  territory  set  T  that  allows  us  to  answer  this  question  using 
small  space  and  time. 

The  territory  set  T  is  a  refinement  of  the  illuminator  data  structure  used  by  Guibas 
and  Saxe  in  their  algorithm  for  the  connected  components  problem  [4].  The  territory  set 
is  essentially  a  colored  partition  of  the  scanline.  Conceptually,  each  territory  has 
two  fields:  its  interval  and  its  color.  The  interval  is  a  closed  interval  in  y.  We  implement 
the  color  indirectly  by  associating  with  each  territory  a  representative  rectangle  which  is 
in  the  territory,  and  therefore  has  the  same  color  as  the  territory.  Each  territory  t  in  T 
obeys  the  following  rules: 

1.  Each  active  rectangle  is  covered  by  exactly  one  territory. 

2.  Each  territory  covers  at  least  one  active  rectangle.  To  ensure  that  the  territory  set 
is  never  empty,  we  assume  there  is  a  dummy  rectangle  above  all  rectangles  in  the 
data  base  that  extends  the  full  length  of  the  design. 

3.  All  active  rectangles  covered  by  territory  t  have  the  same  color  as  t. 
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Figure  3:  The  inefficiencies  that  can  arise  from  intersection-based  connectivity  algo¬ 
rithms.  Colors  of  active  rectangles  are  represented  as  circled  numbers.  When  rectangle 
E  enters  we  would  like  to  know  it  should  have  the  same  color  as  each  rectangle  colored 
1  without  recognizing  this  fact  three  different  times  via  intersection  checks. 


Figure  4:  The  territory  set  (right)  for  a  collection  of  rectangles  (left)  is  essentially  a  col¬ 
ored  partition  of  the  scanline.  Colors  of  active  rectangles  and  territories  are  represented 
as  circled  numbers. 

For  example,  in  Figure  4  no  rectangles  go  across  the  boundary  between  territories  tx 
and  t2.  Each  territory  covers  at  least  one  active  rectangle.  Each  active  rectangle’s  color 
corresponds  to  the  color  of  the  territory  that  covers  it.  Here,  rectangles  A,  E  and  G  and 
territory  ti  that  covers  them  are  colored  17.  Rectangles  C  and  F  and  territory  t2  are 
colored  42. 

The  territory  set  T  supports  the  following  operations: 

T-lNSERTJ(t):  Add  territory  t  to  the  territory  set. 

T-DELETEl(t):  Delete  territory  t  from  the  territory  set. 

LoCATE(y):  Return  the  territory  that  includes  the  y- coordinate  y.  If  the  point  y  falls 
on  the  boundary  between  two  territories,  the  lower  of  the  two  is  returned. 

NEXT(f):  Return  the  territory  immediately  above  territory  t. 
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COLOR(t):  Return  the  color  of  territory  t.  This  operation  involves  getting  t's  represen¬ 
tative  rectangle  and  getting  the  color  from  the  rectangle. 

The  territory  set  T  can  be  implemented  as  a  standard  height- balanced  tree  using 
0(W )  space.  The  operations  T-Insert!,  T-Delete!,  Locate,  and  Next  can  each  be 
implemented  in  0(lg  W)  =  0(lg  N)  time.  As  a  simple  optimization,  the  territories  can 
be  linked  in  order,  which  allows  NEXT  to  run  in  constant  time. 

3  The  Connected  Components  Algorithm 

This  section  presents  the  connected  components  algorithm  which  operates  in  two  phases. 
The  first  phase  is  a  forward  scan  over  the  rectangles  during  which  connectiv:’ ty  informa¬ 
tion  is  prepared  that  is  written  out  to  an  intermediate  sequential  file  on  disk.  The  second 
phase  is  a  backward  scan  over  the  intermediate  file  during  which  component  labels  are 
assigned  to  each  rectangle. 

The  algorithm  assumes  that  the  events,  which  correspond  to  the  scanhne  crossing 
left  or  right  edges  of  rectangles,  are  sorted  by  x-coordinate.  If  not,  the  events  must  first 
be  sorted,  which  takes  O(iVlgjV)  time  in  the  worst  case.  (As  a  practical  matter,  we 
can  often  do  much  better  because  many  VLSI  databases  already  keep  rectangles  sorted 
by  left  edge.)  Given  a  file  sorted  by  left  edge  alone,  we  can  sort  it  into  start  and  end 
events  in  0(N  IgN)  time  and  0(lV)  space  using  an  idea  due  to  Szymanski  and  Van  Wyk 
[9].  The  idea  is  to  keep  a  priority  queue,  such  as  a  heap  [1,  pp.  147-152],  in  primary 
memory.  During  the  operation  of  the  algorithm,  the  priority  queue  holds  at  most  W  +  1 
rectangles  sorted  by  right  endpoint.  When  a  new  rectangle  is  read  in,  its  right  endpoint 
is  stored  in  the  priority  queue.  Then  the  priority  queue  is  emptied  of  all  rectangles  with 
right  endpoint  smaller  than  the  left  endpoint  of  the  new  rectangle.  For  each  of  these 
rectangles,  the  right  endpoint  is  written  out  in  order  as  an  end  event.  Then  the  left 
endpoint  of  the  new  rectangle  is  written  out  as  a  start  event.  Thus,  without  loss  of 
generality,  we  can  assume  the  start  and  end  events  are  presorted. 

There  axe  other,  more  mundane  data  management  issues  to  be  faced  in  the  course  of 
programming  the  connected  components  algorithm  described  here.  Most  of  these  can  be 
resolved  using  simple  pointer  associations,  but  the  more  complicated  will  be  addressed 
directly  in  the  sections  to  come. 

3.1  The  forward  scan 

The  data  structures  used  by  the  forward  scan  contain  only  those  rectangles  that  are 
active,  which  ensures  that  the  0(W)  space  bound  is  met,  but  which  also  leads  to  problems 
maintaining  connectivity  across  the  entire  database.  When  we  see  an  end  event  for  a 
rectangle  .4  signaling  that,  .4  is  to  become  inactive,  we  are  not  prepared  to  give  .4  a  final 
label,  yet  we  must  purge  .4  from  our  internal  data  structures.  For  example,  at  the  time 
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Figure  5:  Each  rectangle  A  picks  a  friend  that  is  active  at  the  time  A  leaves  and  is  known 
to  be  in  the  same  connected  component. 

rectangles  A  and  C  in  Figure  5  become  inactive,  there  is  no  way  to  guess  that  they  are 
in  the  same  connected  component.  Were  we  to  give  them  final  labels  now,  we  would 
incorrectly  give  them  distinct  labels. 

Since  we  cannot  give  each  rectangle  A  a  final  label  in  the  forward  scan,  we  give  it  a 
friend.  Rectangle  A’s  friend  is  another  rectangle  which  (1)  is  active  at  the  time  rectangle 
A  leaves,  and  (2)  is  known  to  be  in  the  same  connected  component  as  rectangle  .4.  If 
there  is  no  such  rectangle  at  the  time  rectangle  A  leaves,  then  its  friend  is  NIL.  Figure  5 
shows  a  possible  assignment  of  friends. 

At  the  end  of  the  forward  pass,  each  connected  component  is  linked  together  by  a 
tree  of  friend  arrows.  From  this  friend  information,  the  back  pass  can  construct  final 
component  labels.  The  idea  is  that  each  friend  arrow  points  from  left  to  right  if  the 
source  and  destination  rectangles  axe  sorted  by  right  edge,  or  equivalently,  by  time  of 
exit.  Thus,  a  component  label  assigned  to  the  root  of  the  tree  will  propagate  right  to 
left  through  the  tree  during  the  back  scam. 

The  start  event 

Processing  a  stairt  event  for  rectangle  A  during  the  forward  scan  involves  four  steps: 
setting  up,  handling  top  and  bottom  boundary  conditions,  recoloring  affected  rectangles, 
amd  cleaning  up. 

Set  up.  Figure  6  shows  the  importamt  {/-coordinates  and  intervals  for  the  general 
cawe.  The  bottom  and  top  coordinates  of  A’s  interval  are  designated  yi >ot  and  ytop.  The 


Figure  6:  The  territory  set  is  shown  on  the  left  and  the  rectangles  on  the  right  for  the 
case  when  k  >  1.  The  colors  of  territories  ..  ,  t*  are  a  first  guess  at  the  colors  to 

merge  because  of  rectangle  A's  entrance. 

endpoints  of  the  k  territories  in  the  territory  set  T  that  A  overlaps  are  y0,  yi ,  ■  •  • ,  y*.  The 
k  territories  are  gathered  into  a  list  L  by  first  using  LoCATEto  find  the  territory  that 
includes  y^ot,  and  then  using  NeXT  to  gather  the  remaining  territories  that  overlap  A's 
interval  (y&oti  Vtop]-  All  the  territories  in  L  are  then  removed  from  T.  which  leaves  a  gap 
in  T  from  yo  to  y*.  This  gap  will  be  repaired  in  subsequent  steps. 

Intuitively,  the  colors  of  the  territories  in  list  L  represent  our  first  guess  at  which 
colors  must  be  merged  due  to  the  entrance  of  rectangle  A.  Since  each  territory  contains 
at  least  one  active  rectangle,  the  territories  in  the  middle  of  the  list  will  necessarily 
contain  a  rectangle  that  intersects  A. 

Handle  boundary  conditions.  Rectangle  A  extends  only  partially  into  the  top  and 
bottom  territories,  so  we  must  explicitly  reference  the  scan  set  S  to  determine  whether 
there  are  active  rectangles  in  these  two  territories  that  intersect  .4.  We  describe  only 
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the  handling  of  the  top  boundary  condition  since  the  bottom  boundary  condition  is 
symmetric.  Also,  for  simplicity,  we  shall  consider  the  special  case  k  =  1  (Figure  7)  after 
we  deal  with  the  general  case  k  >  2  (Figure  6). 

Handling  the  top  boundary  condition  for  k  >2  involves  determining  whether  the  top 
territory  should  be  kept  in  list  L.  The  first  case  is  when  there  is  some  active  rectangle  B 
that  intersects  the  interval  [y*-x,  y<op]-  The  interval  of  the  rectangle  B  falls  entirely  within 
the  top  territory,  so  it  follows  that  A,  B ,  and  every  other  active  rectangle  covered  by  the 
top  territory  must  have  the  same  color  by  the  time  we  finish  processing  the  entrance  of 
A.  Therefore,  we  leave  the  top  territory  in  the  list  £,  and  nothing  is  to  be  done. 

Otherwise,  no  active  rectangle  intersects  A  in  the  top  territory,  and  the  top  territory 
is  removed  from  L.  Since  k  is  at  least  2,  there  must  be  an  active  rectangle  in  the  interval 
[ytop,yk\  because  the  top  territory  must  contain  at  least  one  rectangle,  and  the  interval 
[yk-i,ytoV]  contains  none.  Therefore,  we  can  return  the  top  territory  to  the  territory  set 
with  the  shortened  interval  [ytopiVk]  without  violating  any  of  the  properties  a  territory 
must  have.  In  other  words,  chopping  off  empty  space  does  not  hurt. 

We  now  discuss  the  processing  of  the  top  boundary  condition  for  the  special  case 
when  k  =  1  (Figure  7),  since  once  again,  the  bottom  boundary  condition  is  symmetric. 
If  rectangle  .4  intersects  some  active  rectangle,  then  we  are  done.  If  rectangle  A  does  not 
intersect  any  active  rectangle,  it  is  possible  that  the  rectangle  that  justified  the  existence 
of  the  single  territory  in  list  L  is  below  rectangle  A,  instead  of  above.  In  this  case,  we 
must  explicitly  query  the  scan  set  5  with  the  interval  [ytop,yi]  to  determine  whether 
there  is  an  active  rectangle  to  justify  putting  a  territory  over  the  interval.  If  there  is 
an  active  rectangle,  we  must  enter  a  new  territory  into  T  with  the  shortened  interval 
[ytop,  yi]  using  the  color  of  the  old  territory. 

Recolor.  Now,  the  colors  of  the  territories  in  list  L  are  exactly  the  colors  that  must  be 
merged  because  of  rectangle  A’s  entrance.  We  first  color  rectangle  A  with  a  new  color.  We 
then  merge  A' s  color  with  the  color  of  each  territory  in  L.  The  colors  are  automatically 
garbage  collected  by  the  component  set  Q.  Because  of  our  pointer  implementation  of 
territory  colors,  no  territory  is  ever  colored  with  a  garbage-collected  color. 

Clean  up.  We  finish  the  servicing  of  rectangle  A’s  entrance  by  repairing  the  territory 
set  T  and  making  A  active.  The  gap  left  after  handling  boundary  conditions  becomes 
the  interval  of  a  new  territory  with  the  color  of  rectangle  A.  We  insert  Rectangle  A  into 
the  rectangle  set  R  and  the  scan  set  5.  Since  the  left  side  of  a  rectangle  indicates  an 
end  event  in  the  back  scan,  we  enter  an  end  event  for  rectangle  A  in  the  intermediate 
file  that  will  serve  as  input  to  the  back  scan. 

The  end  event 

Servicing  an  end  event  for  rectangle  A  requires  us  first  to  find  the  associated  rectangle 
object  for  A  in  the  rectangle  set  R.  Then,  we  must  output  a  start  event  for  .4  in  the 
back  pass  and  fix  up  the  internal  data  structures.  We  accomplish  this  processing  in  three 


steps:  making  rectangle  4  inactive,  associating  A  with  a  friend,  and  fixing  the  territory 
set  T. 

Make  A  inactive.  Let  q  be  rectangle  4’s  color  before  processing  the  end  event  We 
uncolor  rectangle  A ,  and  remove  it  from  the  scan  set  5  and  the  rectangle  set  R. 

Find  a  friend.  We  query  the  component  set  Q  for  a  representative  of  color  q  And 
associate  this  representative  rectangle  (possibly  NIL)  with  rectangle  A  so  that  .4  can  now 
tell  its  friend  when  asked.  We  write  out  this  information  as  a  start  event  for  rectangle  .4 
for  use  in  the  back  scan.  We  shall  say  that  rectangles  that  receive  NIL  as  a  friend  have 
no  friend  or  are  friendless. 

Fix  the  territory  set.  We  pick  any  point  on  rectangle  4’s  interval,  and  use  LocATEto 
find  the  one  territory  t  that  covers  rectangle  A.  We  then  find  a  rectangle  B  in  the 
scan  set  5  that  intersects  t’s  interval  to  see  if  there  is  some  active  rectangle  to  justify 
t's  existence.  (Recall  that  a  territory  must  cover  at  least  one  active  rectangle.)  If  no 
rectangle  exists,  then  A  is  the  last  active  rectangle  in  t’s  interval,  and  territory  t  can  be 
eliminated  by  extending  the  interval  of  the  next  territory  above  t  to  include  t’s  interval. 

If  the  existence  of  territory  t  is  justified  by  some  active  rectangle  B ,  and  .4  is  serving 
as  the  representative  for  territory  t,  then  we  make  rectangle  B  the  representative  of 
territory  t. 

3.2  The  back  scan 

The  second  phase,  the  back  scan,  passes  backwards  through  the  intermediate  file  of 
rectangle-friend  information  created  in  the  forward  scan,  and  produces  a  final  file  of 
rectangle-label  pairs  which  will  be  sorted  by  left  edge.  During  this  right-to-left  scan, 
each  rectangle  receives  its  final  labeling  from  its  friend.  The  back  scan  uses  only  one 
data  structure,  the  rectangle  set  R.  It  also  requires  a  counter  initialized  to  0. 

During  the  back  scan,  the  rectangle  set  R  holds  all  active  rectangles,  and  each  active 
rectangle  knows  its  final  component  label.  Labels  are  assigned  sequentially  during  the 
back  scan,  and  the  counter  holds  the  value  of  the  next  label  to  be  assigned. 

The  start  event 

The  first  step  in  servicing  a  start  event  for  rectangle  A  is  to  assign  a  final  label  to  .4  If 
A  has  no  friend,  it  is  the  rightmost  rectangle  in  its  component,  and  so  a  new  label  must 
be  assigned  from  the  counter.  We  store  this  label  into  A  and  increment  the  counter. 

Otherwise,  find  rectangle  .4’s  friend  in  the  rectangle  set  R ,  and  give  .4  the  same 
label  as  its  friend.  Rectangle  4’s  friend  must  be  active  since  4  and  its  friend  were 
simultaneously  active  in  the  forward  scan.  Rectangle  4  left  first  in  the  forward  scan  so  it 
must  enter  after  its  friend  in  the  back  scan.  Finally,  we  add  rectangle  4  to  the  rectangle 
set  R. 


The  end  event 


Processing  an  end  event  for  a  rectangle  A  consists  of  simply  removing  .4  from  the  rect¬ 
angle  set  R  and  writing  out  rectangle  A  with  its  label  to  a  final  file.  No  rectangle  that 
subsequently  enters  has  A  as  a  friend  because  the  two  rectangles  are  not  simultaneously 
active.  Thus,  no  other  rectangle  will  need  to  get  a  label  from  A,  and  hence,  it  is  safe  to 
remove  .4.  The  final  file  is  sorted  by  left  edge  from  right  to  left.  Reversing  the  file  leaves 
it  sorted  left  to  right  by  left  edge  as  was  the  original  input  file. 


4  Proof  of  correctness 

This  section  shows  that  two  rectangles  get  the  same  label  if  and  only  if  they  are  in  'he 
same  connected  component. 

(=>)  We  first  show  that  if  two  rectangles  are  given  the  same  label,  then  they  are  in  the 
same  connected  component.  We  prove  this  by  induction  on  the  number  of  rectangles 
given  the  same  label.  Suppose  rectangle  .4  is  the  first  rectangle  given  label  l.  Then 
at  the  time  we  process  rectangle  A’s  start  event  during  the  back  scan,  r«ctangle  .4  is 
friendless  and  the  value  of  the  counter  is  /.  If  rectangle  A  had  a  friend,  it  would  be  given 
the  same  label  as  its  friend  contradicting  our  assumption  that  rectangle  .4  was  the  first 
to  receive  its  label.  The  counter  is  incremented  after  rectangle  A  is  given  the  label  / 
so  no  friendless  rectangles  to  enter  after  A  get  the  label  /.  By  the  same  argument,  no 
friendless  rectangles  to  enter  before  rectangle  .4  are  given  the  label  /. 

Assume  that  at  some  point  in  the  backscan,  k  rectangles  have  been  given  label  /  and 
all  k  are  in  the  same  connected  component.  Some  j  <  k  of  these  rectangles  are  active 
( t.  e.  are  in  the  rectangle  set  R).  Now  the  start  event  for  some  rectangle  B  causes  B  to 
get  label  l  For  this  to  happen,  rectangle  B  must  have  a  friend  rectangle  C  which  is  one 
of  the  j  active  rectangles  with  label  /.  Since  rectangle  C  is  rectangle  B's  friend,  both 
rectangles  B  and  C  must  have  had  the  same  color  in  the  component  set  Q  at  the  time 
rectangle  B  left  in  the  forward  scan. 

To  finish  the  argument,  we  show  that  two  rectangles  simultaneously  having  the  same 
color  in  the  component  set  Q  are  in  the  same  connected  component.  If  this  is  true,  then 
rectangles  B  and  C  are  in  the  same  connected  component  and  therefore  by  transitivity 
rectangle  B  is  in  the  same  connected  component  as  the  other  k  rectangles  given  label  l. 

We  show  that  two  rectangles  sharing  a  color  in  the  component  set  Q  must  be  in  the 
same  connected  component  by  induction  on  the  number  of  rectangles  with  that  color. 
A  new  color  is  introduced  into  the  component  set  Q  only  when  a  Color1  operation 
is  performed  upon  a  rectangle  .4  during  its  start  event.  Hence  each  color  begins  with 
only  on**  member  rectangle  Other  rectangles  join  a  color  only  through  the  Reoolor! 
operations  performed  during  the  processing  of  the  start  event  for  a  rectangle. 

Assume  that  before  processing  the  start  event  for  a  rectangle  .4,  all  rectangles  with 
the  sail  e  color  in  the  component  set  Q  are  in  the  same  connected  component  After 


handling  the  boundary  conditions,  there  are  m  >  0  territories  in  the  list  L  These 
territories  have  n  <  m  distinct  colors  91,92, . . .  ,qn  which  are  all  merged  into  one  tiiuil 
color  9.  The  colors  91, $3, . .  ,qn  are  exactly  those  colors  for  which  at  least  one  member 
rectangle  intersects  rectangle  .4.  Each  pair  of  rectangles  in  the  final  color  q  is  connected 
by  a  path  of  intersecting  rectangles.  If  they  shared  a  color  q,  in  the  component  set  Q 
before  rectangle  A  entered,  then  by  assumption  there  is  a  path  connecting  them  tliM 
includes  only  rectangles  originally  colored  qt.  Otherwise,  there  is  a  path  between  the  two 
rectangles  that  includes  rectangle  A.  Therefore  all  rectangles  now  colored  q  are  in  the 
same  connected  component. 

( <= )  We  now  prove  that  if  two  rectangles  are  in  the  same  connected  component,  then 
they  get  the  same  label.  It  suffices  to  show  that  if  two  rectangles  intersect,  they  get  the 
same  label  because  then  all  rectangles  in  the  same  connected  component  get  the  same 
label  by  transitivity.  The  proof  has  two  parts.  First,  we  argue  that  if  two  rectangles 
intersect,  then  during  the  forward  scan  they  have  the  same  color  in  the  component  set 
Q  while  they  are  both  active.  Next  we  show  that  if  two  rectangles  are  simultaneously 
active  and  have  the  same  color  in  the  component  set  Q,  then  they  get  the  same  label. 

To  show  that  if  two  rectangles  A  and  B  intersect,  they  have  the  same  color  in  the 
component  set  Q  while  they  are  both  active,  assume  without  loss  of  generality  that 
rectangle  B  enters  after  rectangle  A.  Let  t  be  the  territory  in  the  territory  set  T  that 
covers  rectangle  A  at  the  time  rectangle  B  enters.  Since  rectangles  .4  and  B  intersect, 
territory  t  must  at  least  partially  cover  rectangle  B  so  it  is  gathered  into  the  list  L  in 
the  first  step  of  the  processing  of  the  start  event  for  rectangle  B.  The  presence  of  the 
active  rectangle  A  intersecting  rectangle  B  guarantees  that  after  the  boundary  condition 
checks,  territory  t  is  still  in  the  list  L.  Therefore  after  the  merging,  rectangles  .4  and 
B  are  the  same  color  in  the  component  set  Q  From  that  point  on  they  always  move 
together  in  any  recolorings,  so  they  always  have  the  same  color  until  one  of  them  leaves 

To  show  that  if  two  rectangles  are  in  the  same  color  in  the  component  set  Q  while  they 
are  active,  then  they  get  the  same  final  label,  suppose  a  rectangle  A  is  about  to  leave, 
and  consider  the  set  of  rectangles  that  have  the  same  color  as  rectangle  .4.  Rectangle  .4 
chooses  one  as  a  friend  (shown  by  an  arrow  in  Figure  8).  Later,  other  rectangles  may 
join  this  set  through  merges.  As  each  rectangle  in  the  set  leaves,  it  chooses  a  friend  from 
among  those  left  in  the  set.  Eventually  an  exiting  rectangle  finds  itself  alone,  and  it 
exits  without  a  friend.  For  example,  Figure  8  illustrates  the  sequence  of  friend  choices 
for  one  set  of  rectangles  taken  from  the  example  in  Figure  5.  During  the  forward  scan 
each  of  these  rectangles  simultaneously  shares  a  color  in  the  component  set  Q  with  at 
least  one  other  rectangle  in  the  set.  For  example  rectangles  A  and  B  share  a  component 
immediately  after  rectangle  B  enters  and  rectangles  £?,£,  and  F  share  a  component 
immediately  after  rectangle  F  enters. 

We  can  view  the  illustration  in  Figure  8  as  an  acyclic  graph  with  the  rectangles  as 
vertices  and  the  friend  relation  arrows  as  directed  edges.  Each  vertex  has  out  degree  one 
except  for  a  single  sink,  the  friendless  rectangle  H.  If  we  start  at  any  vertex  in  the 
graph  and  follow  the  edges,  we  always  end  up  at  the  sink.  We  know  from  our  previous 
argument  that  a  rectangle  gets  the  same  label  as  its  friend.  That  friend  in  turn  gets 
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Figure  8:  An  example  of  a  component  taken  from  Figure  5.  The  arrows  represent  the 
friend  relation.  Each  rectangle  shares  a  component  color  with  at  least  one  other  rectangle 
in  this  set  during  the  forward  scan.  During  the  back  scan,  these  rectangles  enter  from 
right  to  left.  Rectangle  H  receives  a  new  label,  and  all  the  other  rectangles  receive  'heir 
labels  indirectly  from  rectangle  H . 

the  same  label  as  its  friend,  .  . .  (down  the  friend  links)  .  . who  gets  the  same  label  as 
the  sink  H .  By  transitivity  any  two  rectangles  that  are  in  the  same  component  of  the 
component  set  Q  while  active  get  the  same  final  label. 

5  Analysis 

This  section  shows  that  the  worst-case  running  time  of  the  connected  components  algo¬ 
rithm  is  0(X  lg.V),  the  amount  of  primary  memory  required  is  O(ll').  and  the  number 
of  transfers  between  primary  and  secondary  memory  is  0(  -V).  We  have  already  seen  that 
each  data  structure  requires  only  O(W)  primary  memory  space,  and  it  can  be  verified 
that  the  number  of  disk  transfers  is  0(\).  Thus,  we  must  demonstrate  that  the  running 
time  of  the  algorithm  is  0(S  lg.V), 

The  rectangle  set  R  and  the  scan  set  S  each  contribute  only  0(  .V  lg  .V )  to  the  overall 
time.  The  rectangle  set  R  is  used  in  both  the  forward  scan  and  the  berk  scan  It 
contributes  only  O(.WgN)  to  the  time  in  each  phase  since  it  performs  at  most  two 
operations,  each  requiring  O(lg.V)  time,  on  each  of  the  0(  .V )  start  and  end  events  The 
scan  set  5  performs  one  insertion  or  deletion  and  at  most  four  S-FlND  operations  for 
each  start  or  end  event. 

Operations  on  the  territory  set  T  contribute  O(.Vlg.Y)  time  as  well  Dunns  the 
servicing  of  an  end  event,  the  territory  set  T  performs  at  most  one  Locate,  two 
T  Delete'  s,  and  one  T-lNSERT!.  if  we  regard  the  modification  of  a  territory  inter 
val  as  a  deletion  followed  by  an  insertion.  For  a  start  event,  only  one  Locate  and  at 
most  thr»e  T  Insert'  s  are  performed  The  number  of  calls  to  NEXT.  T  DELETE'.  and 
Color  directly  depends  on  the  size  of  the  list  L.  however. 

We  si  all  show  that  each  operation  is  performed  at  most  (){  .V  I  times  The  operations 
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that  are  performed  a  constant  number  of  times  for  a  given  event  are  executed  O(N) 
times  overall.  The  other  operations  are  called  once  for  each  time  a  territory  appears 
in  a  list  L  during  a  start  event.  Thus,  showing  that  the  sum  total  of  the  sizes  of  L 
throughout  the  entire  forward  pass  is  O(N)  will  produce  our  desired  bound.  The  total 
number  of  insertions  into  T  is  at  most  4 IV,  which  therefore  bounds  the  total  number  of 
deletions.  Moreover,  each  of  these  territories  can  participate  in  a  list  L  only  once  since 
it  is  deleted  from  T  at  that  time  and  replaced  by  the  consolidated  territory  or  a  new 
boundary  territory.  Hence,  the  sum  total  of  the  lengths  of  L  is  0(N),  which  also  bounds 
the  number  of  times  any  operation  is  performed.  Since  each  operation  costs  O(lgJV) 
time,  the  total  work  performed  on  the  territory  set  is  0(N\gN). 

It  remains  to  analyze  the  component  set  Q.  Each  start  event  causes  one  Color! 
operation,  and  each  end  event  causes  one  UNCOLOR!  and  REPRESENTATIVE  operation. 
Using  the  same  arguments  as  above  for  the  territory  set,  at  most  0(N)  Recolor! 
and  COLOR  operations  are  performed  throughout  the  whole  forward  scan.  Thus,  its 
contribution  to  the  overall  running  time  of  the  connected  components  algorithm  is  also 
O(.Vlg.V). 


6  Conclusions 


This  section  presents  the  important  extension  of  the  connected  components  algorithm  to 
multiple  layers.  We  also  discuss  some  alternative  implementations  of  the  data  structures 
which  may  be  better  suited  to  a  practical  implementation. 


The  connected  components  problem  of  rectangles  in  the  plane  presented  in  this  paper 
is  a  simplification  of  the  problem  faced  in  computer-aided  design  of  VLSI.  Computing 
the  electrically  equivalent  rectangles  in  multiple  planar  layers  of  a  VLSI  design  is  not 
much  more  difficult  than  the  one-layer  problem  discussed  in  this  paper,  even  though 
contact  cuts  can  allow  components  to  snake  up  and  down  among  layers 

To  find  the  connected  components  of  rectangles  on  multiple  layers,  we  simply  run 
a  copy  of  the  basic,  one-layer,  connected  components  algorithm  on  each  layer.  In  the 
forward  scan,  each  layer  is  given  its  own  scan  set.  rectangle  set.  and  territory  set  The 
component  set.  however,  is  global  to  the  entire  computation  Each  contact  is  represented 
explicitly  on  the  layers  it  intersects  In  the  back  scan,  both  the  counter  and  the  rectangle 
set  arc  global  No  further  changes  are  necessary 

Some  of  the  data  structures  necessary  for  the  connected  components  algorithm  can 
be  implemented  more  practically  than  with  the  asymptotically  efficient  height  balanced 
trees  presented  in  the  body  of  the  paper  The  rectangle  set  R,  for  example,  c an  l»e 
implemented  by  hashing  on  the  rectangle  identification  number,  which  would  lead  to 
good  average  rase  behavior  At  the  cost  of  a  bit  more  complication,  the  component  set 
Q  can  be  implemented  with  a  union  find  structure  that  allows  Ol  A  '  merges  in  almost 
linear  time  [  10]  Theae  modifications  improve  the  expected  time  performance  of  some  of 
the  l>ookkeepmg  operations  independent  of  the  statistical  distribution  of  rectangles 
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With  some  assumptions  about  the  distribution  of  rectangles,  the  scan  set  5  and  the 
territory  set  T  can  also  be  implemented  more  efficiently  using  bins,  as  has  been  done 
for  other  VLSI  algorithms  [2].  Each  bin  represents  a  fixed  portion  of  the  scanline  and 
contains  a  pointer  to  the  list  of  objects  that  overlap  that  bin.  A  desirable  bin  size 
can  be  chosen  based  on  statistical  information  about  the  VLSI  design.  The  worst-case 
performance  of  the  algorithm  may  be  diminished,  however,  because  long,  tall  rectangles 
are  split  across  many  bins,  and  consequently,  the  practical  difference  between  this  binning 
approach  and  Szymanski  and  Van  Wyk’s  approach  [9]  may  be  negligible. 
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